D-HOTM: distributed higher order text mining

نویسنده

William M. Pottenger

چکیده

We present D-HOTM, a framework for Distributed Higher Order Text Mining based on named entities extracted from textual data that are stored in distributed relational databases. Unlike existing algorithms, D-HOTM requires neither full knowledge of the global schema nor that the distribution of data be horizontal or vertical. D-HOTM discovers rules based on higher-order associations between distributed database records containing the extracted entities. A theoretical framework for reasoning about record linkage is provided to support the discovery of higher-order associations. In order to handle errors in record linkage, the traditional evaluation metrics employed in ARM are extended. The implementation of D-HOTM is based on the TMI [29] and tested on a cluster at the National Center for Supercomputing Applications (NCSA). Results on a dataset simulating an important DEA methamphetamine case demonstrate the relevance of D-HOTM in law enforcement and homeland defense.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Higher Order Text Mining

-The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. The...

متن کامل

From HOTs to Self-Representing States

According to David Rosenthal, a mental state is conscious just in case its subject suitably represents herself as being in that state, where this entails that the mental state " is accompanied by a noninferential, nondispositional, assertoric thought to the effect that one is in that very state " (2002a, p. 410; see also Rosenthal, 1997, p. 742). This assertoric thought, since it is about anoth...

متن کامل

Competitive Intelligence Text Mining: Words Speak

Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intellige...

متن کامل

A very-short-text clustering method based on distributed representation to identifying research capabilities of a Higher Education Institution

Purpose. Text documents are an important source of data for tech mining techniques. Usually text databases include document sufficiently long to apply conventional text mining techniques. However in some tech mining tasks, such as capabilities identification process, we have database with very short texts, which represent a challenge for conventional text mining techniques. The problem has to d...

متن کامل

MOMEMI: Modern Methods of Data Mining

Modern data mining is used in order to classify and to discover relationships in big data sets. The papers, presented in the framework of the MOMEMI, deals with the most important fields of modern data mining: determining and use of patterns and templates, incremental reasoning, geometrical associations as well as text mining. Keywords-data mining; classification; forecast; cluster; association...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

D-HOTM: distributed higher order text mining

نویسنده

چکیده

منابع مشابه

Distributed Higher Order Text Mining

From HOTs to Self-Representing States

Competitive Intelligence Text Mining: Words Speak

A very-short-text clustering method based on distributed representation to identifying research capabilities of a Higher Education Institution

MOMEMI: Modern Methods of Data Mining

عنوان ژورنال:

اشتراک گذاری